R Markdown
Through EDA, we found that the fire size class in California has a very high frequency in A and B fire class size. Those sizes of wildfires may not be a threats to environment and properties since they could disappear soon. The size of C class and above could be dangerous as the burned size grows fast.
Data are labeled with large fire or not from 1992 to 2015, each data have 3 features as daily temperature soil moisture and rainfall in average of California.
## 'data.frame': 23376 obs. of 5 variables:
## $ time : chr "1950-01-01 00:00:00+00:00" "1950-01-02 00:00:00+00:00" "1950-01-03 00:00:00+00:00" "1950-01-04 00:00:00+00:00" ...
## $ tair_day_livneh_vic: num 4.14 1.77 -3.09 -3.59 -2.75 ...
## $ month : chr "01" "01" "01" "01" ...
## $ year : chr "1950" "1950" "1950" "1950" ...
## $ DOY : chr "001" "002" "003" "004" ...
## 'data.frame': 23376 obs. of 6 variables:
## $ year : int 1950 1950 1950 1950 1950 1950 1950 1950 1950 1950 ...
## $ DOY : int 1 2 3 4 5 6 7 8 9 10 ...
## $ tair_day_livneh_vic : num 4.14 1.77 -3.09 -3.59 -2.75 ...
## $ month : chr "01" "01" "01" "01" ...
## $ soilmoist1_day_livneh_vic: num 18.3 18.5 18.3 18.3 18.1 ...
## $ rainfall_day_livneh_vic : num 0.67193 0.79325 0.07883 0.08991 0.00174 ...
## 'data.frame': 8036 obs. of 8 variables:
## $ year : int 1992 1992 1992 1992 1992 1992 1992 1992 1992 1992 ...
## $ DOY : int 1 2 3 4 5 6 7 8 9 10 ...
## $ tair_day_livneh_vic : num 3.78 4.17 4.19 4.87 4.94 ...
## $ month : chr "01" "01" "01" "01" ...
## $ soilmoist1_day_livneh_vic: num 20.2 20.5 21.4 23.6 25.1 ...
## $ rainfall_day_livneh_vic : num 0.0616 1.1337 3.0754 7.0797 10.8035 ...
## $ n : num 0 0 0 0 0 0 0 0 0 0 ...
## $ fire : num 0 0 0 0 0 0 0 0 0 0 ...
## 'data.frame': 8036 obs. of 8 variables:
## $ year : num 1992 1992 1992 1992 1992 ...
## $ DOY : num 2 3 4 5 6 7 8 9 10 11 ...
## $ tair_day_livneh_vic : num 3.78 4.17 4.19 4.87 4.94 ...
## $ month : chr "01" "01" "01" "01" ...
## $ soilmoist1_day_livneh_vic: num 20.2 20.5 21.4 23.6 25.1 ...
## $ rainfall_day_livneh_vic : num 0.0616 1.1337 3.0754 7.0797 10.8035 ...
## $ n : num 0 0 0 0 0 0 0 0 0 0 ...
## $ fire : num 0 0 0 0 0 0 0 0 0 0 ...
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 3.0691 | 0.3117 | 9.8458 | 0 |
| tair_day_livneh_vic | 0.1461 | 0.0080 | 18.1739 | 0 |
| soilmoist1_day_livneh_vic | -0.3157 | 0.0145 | -21.7253 | 0 |
## Area under the curve: 0.9017
| Predicted 0 | Predicted 1 | Total | |
|---|---|---|---|
| Actual 0 | 3265 | 659 | 3924 |
| Actual 1 | 692 | 3420 | 4112 |
| Total | 3957 | 4079 | 8036 |
To try to increase usability for the model, we try to make a model to predict the probability in next day:
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 2.4707 | 0.3245 | 7.6148 | 0 |
| tair_day_livneh_vic | 0.1508 | 0.0081 | 18.6813 | 0 |
| soilmoist1_day_livneh_vic | -0.2743 | 0.0158 | -17.3927 | 0 |
| rainfall_day_livneh_vic | -0.1312 | 0.0256 | -5.1318 | 0 |
## Area under the curve: 0.9002
| Predicted 0 | Predicted 1 | Total | |
|---|---|---|---|
| Actual 0 | 3244 | 680 | 3924 |
| Actual 1 | 689 | 3423 | 4112 |
| Total | 3933 | 4103 | 8036 |
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -3.7380 | 0.0821 | -45.5091 | 0 |
| tair_day_livneh_vic | 0.2833 | 0.0058 | 48.6031 | 0 |
## Area under the curve: 0.8799
| Predicted 0 | Predicted 1 | Total | |
|---|---|---|---|
| Actual 0 | 3222 | 702 | 3924 |
| Actual 1 | 780 | 3332 | 4112 |
| Total | 4002 | 4034 | 8036 |
The results suppose that with temperature rainfall and soil moisture data from today; We have AUC 0.9 for predicting the the large fire in next day If we try to make a convenient model that require only temperature to predict the probability, the AUC is 0.88.
Multinomial Regression Models
With our training data, we achieve an accuracy of 50.9% for predicting the fire size class of a wildfire based on that month’s condition. From our confusion matrix, it seems that our model is biased towards predicting that a fire is of Class A, which makes sense because smaller fires are much more frequent than larger fires.
## # weights: 35 (24 variable)
## initial value 239251.598736
## iter 10 value 125437.901830
## iter 20 value 124532.532538
## iter 30 value 120485.932245
## iter 40 value 120476.427305
## final value 120476.064543
## converged
## Call:
## multinom(formula = FIRE_SIZE_CLASS ~ tair_day_livneh_vic + soilmoist1_day_livneh_vic +
## rainfall_day_livneh_vic, data = train)
##
## Coefficients:
## (Intercept) tair_day_livneh_vic soilmoist1_day_livneh_vic
## B 0.1892703 -0.004965024 -0.01985095
## C -2.0204514 0.021987337 -0.04766961
## D -3.6520631 0.027847686 -0.04952835
## E -5.2618931 0.065944306 -0.02882023
## F -6.1100380 0.078510150 -0.01594256
## G -5.9574992 0.079728252 -0.08359946
## rainfall_day_livneh_vic
## B -0.1174625
## C -0.1227349
## D -0.1217483
## E -0.1194019
## F -0.2158551
## G -0.4841868
##
## Std. Errors:
## (Intercept) tair_day_livneh_vic soilmoist1_day_livneh_vic
## B 0.08108142 0.001704614 0.004248135
## C 0.18703376 0.003905099 0.010000882
## D 0.39395225 0.008220909 0.021148114
## E 0.55213418 0.011625972 0.029628342
## F 0.70425758 0.014898038 0.037742353
## G 1.04264043 0.021837214 0.057950793
## rainfall_day_livneh_vic
## B 0.007831189
## C 0.020920508
## D 0.044945457
## E 0.063238753
## F 0.091410098
## G 0.180826693
##
## Residual Deviance: 240952.1
## AIC: 241000.1
## [1] 50.94
Now, for our testing data. We achieve a similar accuracy of 50.8%
## [1] 50.86
Now, we try to predict the cause of the wildfire based on the conditions of that month
## # weights: 78 (60 variable)
## initial value 450638.517563
## iter 10 value 351990.198964
## iter 20 value 350236.347991
## iter 30 value 349300.527741
## iter 40 value 348573.289051
## iter 50 value 347430.734728
## iter 60 value 339582.375984
## iter 70 value 338693.455512
## iter 80 value 338687.765856
## final value 338687.730830
## converged
## Call:
## multinom(formula = STAT_CAUSE_CODE ~ month + tair_day_livneh_vic +
## soilmoist1_day_livneh_vic + rainfall_day_livneh_vic, data = data_wildfire)
##
## Coefficients:
## (Intercept) month tair_day_livneh_vic soilmoist1_day_livneh_vic
## 2 8.1322214 -0.1577226 -0.2580958 -0.07582529
## 3 6.7364952 -0.1506793 -0.2671677 -0.10486743
## 4 7.0614347 -0.1431805 -0.2808147 -0.08389730
## 5 7.0942085 -0.1841204 -0.3386374 0.03907005
## 6 5.0782447 -0.1815844 -0.2515738 -0.14392114
## 7 7.4709575 -0.1539434 -0.2673771 -0.06961042
## 8 6.2183748 -0.1596796 -0.2814460 -0.02678882
## 9 8.8417649 -0.1542511 -0.2826069 -0.07942694
## 10 0.5667165 -0.2508201 -0.1473874 -0.01235681
## 11 2.9853740 -0.1429962 -0.2551327 0.03213029
## 12 3.4646714 -0.1985265 -0.3108203 -0.07492545
## 13 4.8432777 -0.1762985 -0.2525367 0.07264006
## rainfall_day_livneh_vic
## 2 -0.4584508
## 3 -0.5247959
## 4 -0.3523235
## 5 -0.5507486
## 6 -0.5633070
## 7 -0.4611825
## 8 -0.5196944
## 9 -0.4275431
## 10 -0.8616573
## 11 -0.3378871
## 12 -0.5633781
## 13 -0.5864476
##
## Std. Errors:
## (Intercept) month tair_day_livneh_vic soilmoist1_day_livneh_vic
## 2 0.1750034 0.006421453 0.003238870 0.008398485
## 3 0.2784172 0.009429017 0.004994638 0.013341810
## 4 0.2314689 0.007877799 0.004249215 0.011052364
## 5 0.1985661 0.006814905 0.003768922 0.009445766
## 6 0.7188434 0.023927300 0.012440760 0.034873084
## 7 0.1964082 0.006987880 0.003617004 0.009397018
## 8 0.2483275 0.008321274 0.004586986 0.011780972
## 9 0.1697367 0.006247095 0.003158474 0.008140895
## 10 1.4665606 0.054808849 0.026711216 0.069498223
## 11 0.4875786 0.014975748 0.009213650 0.022605061
## 12 1.5695856 0.047477926 0.028406177 0.075303662
## 13 0.2225888 0.007652354 0.004187189 0.010472280
## rainfall_day_livneh_vic
## 2 0.01212778
## 3 0.02501050
## 4 0.01548083
## 5 0.01385956
## 6 0.07684880
## 7 0.01429787
## 8 0.01979849
## 9 0.01112386
## 10 0.19991617
## 11 0.02912340
## 12 0.15234282
## 13 0.01731968
##
## Residual Deviance: 677375.5
## AIC: 677495.5
With our training data, we achieve an accuracy of 30%. Looking at our confusion matrix for this model, is is a bit less biased towards predicting a certain category. This makes sense because there does not seem to be one predominant cause of the wildfires in our dataset.
## [1] 30.04
Now, for the test data. We achieve a similar accuracy of 30%
## [1] 30.05
We also wanted to evaluate the long term effect on the wildfires by nature factors. We tried to predict the cases of wildfires, average fire size and total burned fire by nature factors.
From the corrplot, we can find that the number of case have strong correlation with environment variable. And the fire size, total area burned have some correlation with environment variable. We try to use linear model for those predicted variable, the result suppose that the model is not fit well. Then, we take log to those predicted variable. Then the models are improved.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 8.8261 | 0.2806 | 31.4571 | 0 |
| tair_day_livneh_vic | 0.0526 | 0.0068 | 7.7415 | 0 |
| soilmoist1_day_livneh_vic | -0.2322 | 0.0126 | -18.4468 | 0 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 6.4512 | 0.8955 | 7.2041 | 0.0000 |
| tair_day_livneh_vic | 0.0409 | 0.0217 | 1.8858 | 0.0604 |
| soilmoist1_day_livneh_vic | -0.2994 | 0.0402 | -7.4540 | 0.0000 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 15.2773 | 0.9527 | 16.0351 | 0e+00 |
| tair_day_livneh_vic | 0.0936 | 0.0231 | 4.0523 | 1e-04 |
| soilmoist1_day_livneh_vic | -0.5316 | 0.0427 | -12.4385 | 0e+00 |
The r-squared value for model of cases is 0.9 The r-squared value for model of average fire size 0.5377 The r-squared value for model of average total burned size with 0.7825 By the plot check we found that the cases and burned area model fit the linear model assumption well.
We build models to predict the large wildfire number firesize and total burned area.
##
## Call:
## lm(formula = log(n) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic,
## data = joined4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.82435 -0.40529 -0.01388 0.36890 1.68090
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.16436 0.47725 12.92 < 2e-16 ***
## tair_day_livneh_vic 0.09035 0.01117 8.09 2.67e-14 ***
## soilmoist1_day_livneh_vic -0.28469 0.02194 -12.98 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5844 on 248 degrees of freedom
## Multiple R-squared: 0.8616, Adjusted R-squared: 0.8605
## F-statistic: 772 on 2 and 248 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(FIRE_SIZE) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic,
## data = joined4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1378 -0.7470 -0.1646 0.5175 5.0328
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.44918 0.93222 7.991 5.06e-14 ***
## tair_day_livneh_vic 0.04067 0.02182 1.864 0.0635 .
## soilmoist1_day_livneh_vic -0.17695 0.04285 -4.130 4.96e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.141 on 248 degrees of freedom
## Multiple R-squared: 0.3366, Adjusted R-squared: 0.3313
## F-statistic: 62.92 on 2 and 248 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(totalarea) ~ tair_day_livneh_vic + +soilmoist1_day_livneh_vic,
## data = joined4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0838 -0.8558 -0.1446 0.8036 4.5455
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.61355 1.10190 12.355 < 2e-16 ***
## tair_day_livneh_vic 0.13102 0.02579 5.081 7.39e-07 ***
## soilmoist1_day_livneh_vic -0.46165 0.05064 -9.115 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.349 on 248 degrees of freedom
## Multiple R-squared: 0.7391, Adjusted R-squared: 0.737
## F-statistic: 351.2 on 2 and 248 DF, p-value: < 2.2e-16
We have the model that r-squared value for model of cases 0.8616 model of average fire size 0.3366 model of average total burned size with 0.7391 By the plot check we found that the model of large fire case does not fit well as the residual is not consistent. And the model of average large fire size result shows lack of some normality from qq-plot.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 7.7485 | 0.3945 | 19.6405 | 0.0000 |
| Avg_Temp | 0.0633 | 0.0090 | 7.0254 | 0.0000 |
| Avg_SoilMoisture | -0.2239 | 0.0194 | -11.5300 | 0.0000 |
| Avg_Rainfall | -0.1092 | 0.0386 | -2.8244 | 0.0051 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 7.0246 | 0.3338 | 21.0438 | 0.0000 |
| Avg_Temp | 0.0549 | 0.0076 | 7.2086 | 0.0000 |
| Avg_SoilMoisture | -0.2067 | 0.0164 | -12.5777 | 0.0000 |
| Avg_Rainfall | -0.0692 | 0.0327 | -2.1173 | 0.0352 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 7.0687 | 1.1570 | 6.1097 | 0.0000 |
| Avg_Temp | 0.0338 | 0.0264 | 1.2799 | 0.2017 |
| Avg_SoilMoisture | -0.3423 | 0.0570 | -6.0112 | 0.0000 |
| Avg_Rainfall | -0.1353 | 0.1133 | -1.1937 | 0.2337 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 4.7586 | 1.1950 | 3.9821 | 0.0001 |
| Avg_Temp | 0.0768 | 0.0273 | 2.8150 | 0.0053 |
| Avg_SoilMoisture | -0.2591 | 0.0588 | -4.4046 | 0.0000 |
| Avg_Rainfall | -0.0363 | 0.1171 | -0.3104 | 0.7565 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 14.8171 | 1.2407 | 11.9421 | 0.0000 |
| Avg_Temp | 0.0971 | 0.0283 | 3.4273 | 0.0007 |
| Avg_SoilMoisture | -0.5663 | 0.0611 | -9.2714 | 0.0000 |
| Avg_Rainfall | -0.2445 | 0.1216 | -2.0112 | 0.0453 |
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 11.7832 | 1.2892 | 9.1400 | 0.0000 |
| Avg_Temp | 0.1318 | 0.0294 | 4.4758 | 0.0000 |
| Avg_SoilMoisture | -0.4658 | 0.0635 | -7.3394 | 0.0000 |
| Avg_Rainfall | -0.1056 | 0.1263 | -0.8360 | 0.4039 |